Learning to maximize reward rate: a model based on semi-Markov decision processes

نویسندگان

چکیده

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Learning to maximize reward rate: a model based on semi-Markov decision processes

WHEN ANIMALS HAVE TO MAKE A NUMBER OF DECISIONS DURING A LIMITED TIME INTERVAL, THEY FACE A FUNDAMENTAL PROBLEM: how much time they should spend on each decision in order to achieve the maximum possible total outcome. Deliberating more on one decision usually leads to more outcome but less time will remain for other decisions. In the framework of sequential sampling models, the question is how ...

متن کامل

Semi-markov Decision Processes

Considered are infinite horizon semi-Markov decision processes (SMDPs) with finite state and action spaces. Total expected discounted reward and long-run average expected reward optimality criteria are reviewed. Solution methodology for each criterion is given, constraints and variance sensitivity are also discussed.

متن کامل

Semi-Markov Decision Processes

The previous chapter dealt with the discrete-time Markov decision model. In this model, decisions can be made only at fixed epochs t = 0, 1, . . . . However, in many stochastic control problems the times between the decision epochs are not constant but random. A possible tool for analysing such problems is the semiMarkov decision model. In Section 7.1 we discuss the basic elements of this model...

متن کامل

Solving Semi-Markov Decision Problems using Average Reward Reinforcement Learning

A large class of problems of sequential decision making under uncertainty, of which the underlying probability structure is a Markov process, can be modeled as stochastic dynamic programs (referred to, in general, as Markov decision problems or MDPs). However, the computational complexity of the classical MDP algorithms, such as value iteration and policy iteration, is prohibitive and can grow ...

متن کامل

Markov Decision Processes with Arbitrary Reward Processes

We consider a learning problem where the decision maker interacts with a standard Markov decision process, with the exception that the reward functions vary arbitrarily over time. We show that, against every possible realization of the reward process, the agent can perform as well—in hindsight—as every stationary policy. This generalizes the classical no-regret result for repeated games. Specif...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Frontiers in Neuroscience

سال: 2014

ISSN: 1662-453X

DOI: 10.3389/fnins.2014.00101